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UNBIASED  ESTIMATES  WITH  MULTIPLE   INDEPENDENT  PROBABILITIES  OF  SELECTION 


Jack  Ogus 


Occasionally,  a  given  sampling  unit  has 
two  or  more  independent  chances  of  being 
selected.  This  situation  arises,  for 
example,  in  the  Annual  Survey  of  Manufac- 
tures when  several  companies  in  the 
original  sampling  frame  merge  to  form  one 
new  company.   In  such  cases,  if  any  of  the 
new  company's  original  units  had  been 
selected,  an  inflated  value,  X  ,  for  the 
entire  company  is  included  in  the  sample 
estimate;  otherwise,  it  is  not  included  in 
the  sample.  This  procedure  is  unbiased  if 
EX'  =  X,  where  X  is  the  new  company's  value. 

Two  unbiased  estimates  of  the  company's 
value,  X,  may  be  defined: 

M 


and 


Xa'  = 


Xl   M  i%  p.' 


i  -  :•?  (l-a.) 

1=1    1 
M 
1  -  rr  q. 
i=i  ! 


(1) 


(2) 


where  p^  ,  ps ,  . . . ,  pM  are  the  respective 
independent  probabilities  of  selecting 
each  of  the  original  M  companies  which  are 
combined;  a.  is  1  if  the  company  was 
selected  from  the  i   original  source,  o 


if  not  selected;  and  Ea.  =  p..  Intuitively, 

we  have  chosen  Xj  in  preference  to  Xi  when 

we  had  to  choose  between  them.  This  note 

justifies  that  preference. 

Both  estimates  are  unbiased,  for 

X   M  Pi    XM 
1   M  i=i  V±         M 
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and 
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EX2  =  1f-±-  =  X. 

1  -  n  q. 
i=i  i 
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The  estimate  with  the  smaller  variance, 
therefore,  would  be  preferred.  The  respec- 
tive variances  are 

Y     M     a-  y2  M     a^ 
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From  (5)  and  (6),  then, 


a2(X1')^(Xs')=X: 


M 
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and  a2(Xi)  >  a2  (Xa')  if  and  only  if 


(7) 
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A=(l  -  n  q.)   E  —  -  If  n   q.  >  0.      (8) 

i=i  !  i=i  P-     iei  !  ~ 

Now  the  sum  of  the  probabilities  that  either 
0,  1,...,  m, . . . ,  M  of  the  original  units 
will  have  been  selected  is 
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Substituting  for  (l  -  tt  q.)  from  (9)  into 
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Using  the  Cauchy-Schwartz  inequality, 
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Then,  since 
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and  a2  (xi)  >  (Xs').  (H) 

Hence,  if  costs  and  other  relevant  opera- 
tional and  administrative  factors  are 
substantially  the  same  for  both  estimate^ 
Xi  would  be  "inadmissable. " 


A  METHOD  FOR  MONTHLY  SEASONAL  ADJUSTMENT  COMBINING  THE  USE  OF   MONTHLY 

AND  EARLIER  QUARTERLY  DATA 

Nigel  Nettheim 


Suppose  a  series  has  been  observed! 
quarterly  for  n  years,  after  which  it  has 
been  observed  monthly  for  k  years.   Assume 
the  series  contains  a  stable  seasonal  factor. 

THE  ADDITIVE  MODEL 

Firstly  let  the  seasonal  factors  be 

A       A       A       A 

additive.   Let  Qx ,  Qg,  Q3 ,  Q*  be  the  usual 
estimates  of  the  seasonal  factors  from  the 
quarterly  data  alone,  let  Mj  ,  Mg, . . . ,M12 
)e  the  usual  estimates  of  the  seasonal  factors 
from  the  monthly  data  alone,  and  define 
qi  -  Ma  +  M2  +  M3 

A  A  A  A 

q3  =  M4  +  M5  +  Ms 
q3  =  MV  +  Me  +  Mg 

A  A  A  A 

q4  =  Mio+  Mn+  M12 
It  is  suggested  that  the  following 
estimates  S.  of  the  monthly  seasonal  factors, 
based  on  both  the  quarterly  and  monthly 
data,  are  to  be  preferred  to  the  estimates 

A 

M.  based  on  the  monthly  data  alone.  We 
define 

Si     =  Mx   +   (Qi   -  qx)   n/3(k  +  n) 

52  =  M2   +   (Qx    -  qx)   n/3(k  +  n) 

53  =  &    +   (Qi    -  qi)    n/3(k  +  n) 

54  =  M4   +   (Q2   -  q2)    n/3(k  +  n) 

•        •        • 

Sis  =  Mi2+  (S*  -  q4)   n/3(k  +  n) 


The  motivation  for  the  definition  of  S.  is 

1 

that,  if  we  consider  (unbiased)  estimates 
of  the  form 

Si   =  Mx  +  (Qi  -  qx)  a 
then,  as  shown  in  the  next  section,  the  choice 

a  =  n/3(k  +  n) 
minimizes  the  variance  of  the  estimate.   It 
may  also  be  noted  that 

Si  +  S2  +  £3  =  qx  +(Qi  -  q\)n/(k  +  n) 
=  (kqi  +  n^i)/(k  +  n) 
which  is  the  natural  (best  linear  unbiased) 
estimate  of  the  quarterly  seasonal  factor  Qj . 

MEAN  AND  VARIANCE  OF  Sj 

We  have,  in  terms  of  the  constant  a, 
E  Sx  =s  E  Mi  +  E(Qi  -  qx)  a  =  Sj  +  0 

so  that  the  estimates  S.  are  unbiased.   Also 

1 

Var  Sx  =  Var  [<  1  -  a)  Mx  -  a  (Mg  +  M3)  +  aQj. 
Now  the  estimates  are  based  on  non-overlapping 
time  periods  and  so  may  be  assumed  statis- 
tically independent;  hence  the  variances  are 
additive.  Further,  Var  M.  =  A/k  where  A  is 
a  constant  independent  of  i,  the  variance 
being  inversely  proportional  to  the  number  of 
observations  on  which  the  estimate  is  based. 

A 

The  estimate  Q.  has  the  form  of  the  sum  of 
three  separate  monthly  estimates  (these 


latter  not  being  observed  individually)  and 

so  we  may  take  Var  Q.  =  3A/n. 

Therefore, 

Var  S.  =  [(l-a)2+  2aa ]A/k  +  aa3A/n 

=  Ta23(k  +  n)  -  2an  +  n]A/kn   , 

which  takes  a  minimum  value  of 

[1  -  n/3(k  +  n)]A/k 

when  a  =  n/3(k  +  n) . 

Thus  Var  S.  <  Var  M.  so  that  the  estimates 

S.  based  on  the  combined  data  are  better  than 
1 

the  usual  estimates  M.  based  on  the  monthly 
data  alone.   The  percentage  increase  in  the 
variance  which  must  be  accepted  if  the 
quarterly  data  are  not  utilized  is 

[n/(2n  +  3k)]  x  100%  ; 
for  example  if  n  =  15  and  k  =  3  the 'increase 
would  be  38%,  while  if  n  =  20  and  k  =  3  it 
would  be  41%.   For  large  k  and  fixed  n  the 
increase  approaches  zero;  for  large  n  and 
fixed  k  it  approaches  50%. 

THE  MULTIPLICATIVE  MODEL 

Assume  now  that  the  seasonal  factors  are 
multiplicative.   As  before,  let  C1J...Q4 
and  M1,...M1S  be  the  usual  estimates  based 
on  the  separate  quarterly  and  monthly  data, 
respectively.   Now  define 
q,  =  (M-^H,/0 

q3    =    LfyMj^y3 
q4    =    (MaoMuM^)1/3 


Then  the  suggested  estimates  using  the 
combined  quarterly  and  monthly  data  are 

Si   =Mi(Qi/qi)n/(k  +  fi) 

,n/(k  +  n) 


53  =  Ma(Qi/qi 

54  =  HftCQs/qa) 


,n/(k  +  n) 


n/(k  +  n) 


S1S=  Ma2(Q4/q4) 


n/Ck  +  n) 


The  motivation  for  the   estimate  S.    now  is 

1 


that 


CS1S2S3)/3  -  Lqa  Q1J  ' 


which  is  the  natural  estimate  of  Qj. 

The  multiplicative  model  is  less  convenient 
mathematically  than  the  additive  model, 
and  the  means  and  variances  can  no  longer 
be  calculated  exactly.   However,  the 
properties  of  these  estimates  will  be 
similar  to  those  of  the  corresponding 
estimates  in  the  additive  model. 

EVOLVING  SEASONAL  PATTERNS 

We  have  dealt  only  with  the  case  of  a 
stable  seasonal  pattern;  analogous  estimates 
could  be  constructed  for  evolving  seasonal 
patterns . 


AN  ANALYSIS  OF  A  CUT-OFF  SURVEY 


Max  A.  Bershad* 


PURPOSE 


THE  SOLUTION 


This   paper  attempts   to  shed  some   light 
on  the  problem  of  determining  the   conditions 
under  which  cut-off  surveys   can  provide 
estimates  as  reliable  as   those   from  complete 
frame   surveys.      It  does   so  by  indicating 
how  close   to   the   truth  one  must  be   able   to 
judge   the  magnitude   of  the  truncated  universe 
which  is  being  sampled  relative   to   the   total 
universe   of  interest. 

THE  PROBLEM 

In  more  specific  terms,  a  survey  has  a 

coefficient  of  variation  of  V  for  an  estimate 

x  of  total  sales.   It  has  two  strata,  the 

first  of  which  accounts  for  P  of  the  total 

s 

sales  and  P  p  of  the  total  variance.  We 
assume  the  bias  of  the  survey  estimate  x 
to  be  zero. 


Since  the  estimate  considered  is  x,  /  p  , 
a 

its  variance  is  L_  and  its  squared  bias  is 

(p  )2 


h  _  V 


The  variance  of  the  survey  is 


o2. 

Now  the  question  to  be  answered  may  be 
stated  as  follows.  When  is 


a2 
-i 

Tp~P 

rs 


X1  \ 


Ps  Ps 


<  a3 


^i        1  A       hs-Ps\2      1  o2 


X2  a2        p2 


^s 


7: 


p    -  p 

^S  £ 


Xc 


<  v= 


Ps  Ps 


If  Xj  is  an  unbiased  estimate  of  the 
sales  of  the  first  stratum  as  derived  from 


the  sample  and  p  is  a  judgment  of  P  ,  under 

s  s 

what  conditions  is  xt/  p  a  better  estimate 
from  the  point  of  view  of  mean- square 
error  than  the  survey  estimate. 


For  simplicity  we  shall  take  p  =  P  in  the 
right  side  of  the  inequality  and  ask  if: 


p  -  P 
s    s 


<   V 


1  -  V 
P3 


AN  ILLUSTRATION-RETAIL  SALES 


(*) 


"The  final  version  was  prepared  after  the 
author's  death  by  Michael  Berry. 


The  values  of  the  parameters  P  p ,  P  , 
and  V  are  shown  in  the  following  table  for 


PARAMETER  VALUES  FOR 

INDIVIDUAL 

KB's 

Stratum  1  Pro 

portion  for 

V 

Kind  of  Business 

Total 
Variance 

Total 
Sales 

V'-? 

p  -  P 
rs    s 

PS 

Total 

.510 

.924 

.006 

.635 

.U% 

Durable  goods 

.760 

.943 

.010 

.381 

.4 

Nondurable  goods 

.570 

.914 

.008 

.564 

.5 

Food 

.380 

.910 

.012 

.736 

.9 

Eating  &  drinking 

.700 

.853 

.021 

.195 

.4 

General  merchandise 

.670 

.989 

.008 

.561 

.4 

Apparel 

.730 

.935 

.023 

.406 

.9 

Furniture  &  appliances 

.640 

.914 

.028 

.484 

1.4 

Lumber 

.930 

.960 

.030 

.010 

.0 

Automotive 

.670 

.954 

.014 

.514 

.7 

Gasoline  service 

.450 

.794 

.023 

.535 

1.2 

Drug  stores 

.890 

.963 

.024 

.201 

.5 

individual  KB ' s  and  all  KB ' s  for  the  current 

retail  sales  survey. 

The  same  table  also  shows,  in  the  last 

column,  the  largest  possible  inaccuracy  of  p 

s 

as  a  judgment  of  P  in  order  not  to  lose 

reliability.   The  last  column  is  constructed 

assuming  no  bias. 

It  will  be  noted  that  (p  -  P  )  /  p 

rs    s  '    ^s 

is  less  than  or  equal  to  V,  in  every  case; 
this  means  that  one  would  have  to  outguess 
the  V  value  at  the  very  least  not  to  lose 
reliability. 

Since  ordinarily  a  dollar  savings  would 
result  from  dropping  the  sample  in  stratum  2, 


it  is  possible  to  use  the  savings  to  increase 
the  sample  size  in  the  first  stratum.   If 
the  sample  size  is  multiplied  by  a  factor 
(.1  +  ex)    the  formula  (*)  becomes 


p  -  P 
^s    s 


P  . 


.  1 


1  +  a 


If  one  could  double  the  sample  size,  i.e., 

a—  1,  then  for  total  retail  trade  the  bound 

for  the  relative  error  of  p  would  change 

s 

from   .U%  to    .5%. 

THE  CRITERION  FOR  MONTH-TO-MONTH  RATIO 

The  preceding  was   concerned  with  the   level 
of  the  estimate.      A  similar  problem  exists 


with  respect  to  the  month-to-month 
ratio. 

With  the  additional  notation  of  t  to 
indicate  one  month  and  t'  to  indicate 
the  succeeding  month,  and  r's  to  indicate 
ratios,  we  have  by  an  analysis  similar  to 
the  one  for  level 


r» 


Xlt«  /Xlt_X!t'  /PSt» 

pst»/   pst       Xit/    Pst 

/    Pst' 

r7  ^ 


MSE 


r' 


st' 
>st 


0s      +  Rf 


"'st' 
>st 


so  that 


Jst' 


Rf     Rs     /  p 


Jst 


0s 
r 


st' 

Ra     0s     I  p   , 

r     \  *st 


taking 

Rf    =      R2    =   1   = 

leads  again  to 


st' 
3st 


for  simplicity, 


st' 


st' 


=  V 


V7 


-  p 


CT2 

r 


pst     hst 

With  values  from  the  retail  survey  for  all 

KB  of  V  =  .002  and  P  .  =  .7, 

r 


st' 
3st 


st' 
st 


would  have  to  be 


less  than  .1  of  1%. 


OPTIMAL  ALLOCATION  IN  STRATIFIED  SAMPLING  WITH  MULTIPLE 

VARIANCE  CONSTRAINTS 


B.  Causey 


This  article  discusses  how  to  find  a  way 
of  sampling  randomly  from  a  set  of  strata  so 
as  to  minimize  sampling  costs  subject  to 
satisfying  a  set  of  constraints-these 
constraints  being  upper  bounds  to  variances 
of  population  mean  estimators  for  a  set  of 
variates  of  interest.  The  problem  is  easily 
solved  when  there  is  only  one  variate;  but 
for  more  than  one  variate  an  approach  such 
as  ours  is  needed. 

INTRODUCTION 

Suppose  that  we  wish  to  sample  randomly 
ru  units  from  each  of  H  given  strata  containing 
N,  units,  h  =  1,  ...,  H,  respectively,  with 
the  cost  per  unit  sampled  s,  (>0)  in 
stratum  h,  and  the  stratum  variance  for  a 
particular  variate  y  of  interest  equal  to  v, 
(>0)  within  stratum  h.  Let  N  =  S  N,  ,  and 
let  y,  be  the  sample  average  of  the  n. . 
sample  values  of  y  for  stratum  h.  We  let 
A  denote  our  estimate  £  N,  y\  /N  of  the  popula- 
tion mean  of  y,  so  that,  letting  z,  = 
vhNh2/N2(Nh  -  1),  Var  (A)  =  ^h\/\~^h- 
We  want  to  choose  the  quantities  n,  so  as 
to  minimize  the  total  cost  £  Sy,11^  subject  to 
the  constraint  that  Var  (A)  <  u  for  a  specified 


value  u  >  0,  i.e.  that  £  z,N,  /n,  <  d  with 

h  h  h  — 

d  =  u  +  E  z,.  Letting  3c  =  l/s,  n,  and 

Bh  =  ^h^V^'  SO  that  the  Prol:)lein  is  to 
minimize  £  (V^)  subject  to  £  B,x.<  1  and 
x,  >  0,  h  =  1,  ...,  H,  we  find  by  the  use  of 
Lagrange  multipliers  or  otherwise  that  an 

optimal  solution  for  x,  is  l/B,2  (E  B.2), 

i      i 
and  hence  for  n,,  l/shx,  =  B,2  (E  B.2)/s,. 

Here  and  throughout  the  paper  we  do  not 

account  for  the  fact  that  the  values  of  n,, 

h  =  1,  ....,  H,  must  in  practice  be  integers. 

Integer  solutions  for  (n, ,  ...,  n„)  in  the 

neighborhood  of  the  calculated  (non- integer) 

optimal  values  for  this  vector  may  easily 

be  investigated. 

Now  suppose  that  instead  of  just  one 

constraint  £  (^N^/n,)  <  d,  we  have,  for 

C  >  1,  C  constraints,  corresponding  to  C 

variates  of  interest,  given  by 

S  (zchW  -  dc  -which>  letting  Bch  = 
shzchNh/dc>  we  express  as 


gV^1'  c  =  1>  — •  c- 


(1.1) 


The  problem  of  minimizing  E(l/x.)  subject  to 

the  constraints  (l.l)  is  the  topic  of  this 

paper.  Bershad  in  [lj  has  approached  the 

problem  for  a^  =%  =  ...  =  s„;  but  it 

n 
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appears  that  for  methods  such  as  his  which 
iteratively  either  add,  remove,  or  switch 
a  unit  optimally",  a  succession  of  steps  each 
optimal  in  terms  of  the  current  state  may 
produce  a  net  effect  which  is  not  optimal, 
as  counterexamples  for  his  method  have 
illustrated.  Hence  we  are  led  to  ways  of 
assigning  the  whole  sample  optimally  in 
one  sweep,  and  hence  to  a  convex  programming 
technique,  with  special  features  added  for 
the  particular  problem  at  hand. 

Section  2  outlines  this  technique; 
Section  3  considers  the  kinds  and  quantities 
of  computer  operations  involved;  Section  4- 
discusses  an  alternative  method  which  should 
be  especially  useful  when  the  number  C  of 
constraints  is  large  relative  to  the  number 
H  of  strata  and/or  many  constraints  tend 
to  be  redundant  in  terms  of  others;  and 
Section  5  provides  some  numerical  examples. 

BASIC  TECHNIQUE  FOR  SOLUTION 

We  let  X  =  (Xl,  ...,  Xjj),  f(X)  =  2(1/^), 
and  g  (X)  =1  -SB  ,x,  ;  our  problem  is 
thus  to 

minimize  f(X)  (2.1) 

subject  to 

gc  (X)  >  0,  c  =  1,  ...,  C,  and 

xh>  (l/shNh),  h=  1,  ...,  H.    (2.2) 
Let  S  denote  the  set  of  X  satisfying  the 
constraints  (2.2).  We  have  that  for  X  e  S, 


f(x)  is  strictly  convex  in  X,  i.e.  the  matrix 

d2f 
of  second  partial  derivatives  ■: — r —  is 

i  J 
strictly  positive  definite,  with  entries 

2x~ 3  along  the  diagonal  and  zero  off  the 

diagonal,  while  all  the  functions  g  (x) 

are  linear  in  X.  From  [2]  we  may  thus 

establish  that  there  is  a  unique  X*  e  S  such 

that  f(X*)  =  min  f(X).  Let  r  ,  k  =  1,  2, 
XeS  K 

...,  be  a  sequence  of  positive  numbers  with 

lim  r,  =  0.  For  k  =  1,  2,  ...,  and  X  e  S,  we 


let 


P.  (X)  =  f(x)  -  r.  {E  log  gn   (X) 


+  E   log  [^  -  (l/shNh)J(;  (2.3) 

we  may  establish  from  [2,  Lemma  13]  that 
P,  (X)  attains  a  minimum  in  S  uniquely  at  a 
point  X*,  k  =  1,  2,  ...,  and  that  lim  X*  =  X*. 

Thus  by  minimizing  P,  for  values  of  k  =  1, 

2,  ...,  approaching  °°,  we  may  approximate  X* 

as  closely  as  we  like. 

To  find  X*,  we  must  begin  with  a  starting 
i 

point  Xo  e  S.  Such  a  point  is  (xn  ,  ..., 

x^tj)*  where  x  .=  min  x  ,  ,  and  x  ,  is  obtained 

oH  '       oh   c   ch*      ch 

for  constraint  c  according  to  the  one-constraint 
allocation  scheme  of  the  introduction,  c  =  1, 
...,  C,  h  =  1,  ....,  H.  Here  we  take  note  of 
the  possibility  that  for  some  c1  all  C 
constraints  may  be  satisfied  by  X  ,  = 
(x  ,  ,  . . . . ,  x  ,„) ;  in  this  unlikely  event 
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X*  =  X  ,.  Otherwise,  we  compute  at  Xo  the 
vector  V  =  -  $i  (  ^  Pi  )  ~1  where  the  vector 


dPx 
Wi    has  components  ■= — , 

H  x  H  matrix  V2Ii  has  entry 


dPx 


>dXH 
d  Pi 

dx.dx. 
1  J 


and  the 


in  row  i, 


column  j,  and  then  find  positive  scalar  t 

equal  to  the  minimum  of  £,  -§u  where  u  is  the 

largest  positive  scalar  such  that  Xo  +  uV  e  S, 

and  w  such  that 

min  Pi  (Xo  +  zV)  =  Pi  (Xo  +  wV) .    (2. A) 
z^? 

2 

(Using  the  fact  that  V  Px  is  always  positive 
definite,  we  may  establish  easily  that  w 
exists,  is  unique,  and  is  strictly  positive; 
we  find  w  easily  by  "marrowing  in".)  We 
replace  Xo  by  Xo  +  tV  and  repeat  the  procedure, 
iterating  until  Xj.  *  has  been  found  to  sufficient 
accuracy  -  "accuracy"  defined  below. 

The  idea  is  to  minimize  ^    and  reduce 
TPx  to  zero  simultaneously  by  iteratively 
calculating  and  moving  along  the  direction  V- 
which  experience  and  theoretical  argument 
suggest  to  be  an  optimal  choice  of  direction 
and  worth  calculating.  We  require  that  T  be 
no  greater  than  min  (-g-,  -gu)  to  prevent  moving 
too  far  along  V  as  computed  at  Xo  >  in  view 
of  the  fact  that  V  changes  as  a  function  of 
X  (and  t). 

Our  criterion  for  "accuracy"  should  be, 
of  course,  based  on  closeness  of  VPl  to  zero; 

2 

since  V  p1    is  positive  definite,   VPX    equals 
zero  only  at  the  unique  point  where  Px    is 


minimized.     Observing  that  the  components  of 
VPt    are 


an  _      1  +      r 


ch 


~  \\  "  s,  N,  ) 


h  h 


,  h=  1, 


. . .,  H, 

we  base  this  accuracy  on  the  ratio 


(2.5) 


2*h 

1  dh 

0/ 

^*h 

) 

(v 

1 

*h 

-  V* 

r  v  B<* 

- *  r 

w  - 

(2.6) 

in  view  of  the  importance  of  £  (l/x,  )  in 
the  problem-stopping  when  this  ratio  is  less 
than  a  specified  positive  quantity,  which  we 
have  taken  to  be  .01. 

As  stated  above,  the  larger  the  value  of 
k  the  more  accurately  X£  approximates  X*. 
Our  procedure  is  to  start  with  an  initial 
positive  ri  ,  find  Xf*  as  above,  and  then  for 
k  =  2,  3,  ...,  let  rk  =  rk  /L0  and  find  X* 
starting  at  X£   as  we  found  Xf  starting  at 
Xo  t   until  X*  has  been  approximated  by  X£  to 
sufficient  accuracy.  Here  we  base  "accuracy" 
on  the  fact  [2]  that 

lim  [f(X*)  -  Pk  (X*)]  =  0,        (2.7) 

and  stop  when,  along  with  the  ratio  (2.6) 
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corresponding  to  k  >  1  being  sufficiently- 
small,  we  have  I  1  -  Pk  (X*)/f(X*)|  less 

than  a  specified  value,  which  we  have  taken 

to  be  .001.  It  appears  that  we  should  be 

careful  not  to  take  the  initial  value  ri 

too  small,  or  a  large  number  of  iterations 

may  be  required  to  find  X£  to  our  desired 

accuracy.  Loosely  speaking,  a  safely  large 

value  for  t\    appears  to  be  vs/dC  (give  or 

take  a  factor  or  two  of  10),  where  v,  s,  and 

d  represent  composite  values  of  the  quantities 

v  ,,  s,  ,  and  d  . 
ch*  h'      c 

DIRECTION  VECTOR  CALCULATION 

Except  possibly  for  very  small  H,  the 
overwhelming  bulk  of  the  calculations  for 
Section  2  are  involved  in  the  (repeated) 


-l       -l 
vector  U,  (A  +  TJ'U)  equals  A  +  a(WW)  where 


-l 


-l 


calculation  of  V.  We  calculate  (^P,  ) 

k 

and  VP,  separately,  and  then  form  their 

product  V,  so  that  the  bulk  of  the  calculations 

-l 
required  are  those  needed  to  find  (^P,  )  . 

To  calculate  this  inverse,  we  (repeatedly) 

use  the  facts  that  (l)  V2R  =  T  +  £Z  'Z  , 

k        c  c' 

where  T  is  diagonal  with  entries  2/x,3 , 

h  =  1,  ...,  H,  and  thus  easily  inverted,  and 

row  vector  Z  equals 

[rki/gc(X)](Bcl,  ...,BcH),  c=l, 

. . . • ,  C,  (3.1) 

and  (2)  for  general  square  matrix  A  and  row 


vector  W  equals  UA  and  scalar  a  equals  -l/ 
(1  +  UW1).  If  the  symmetry  of  V2Pk  is 
utilized  and  we  save  H2  storage  spaces  for 
the  entire  matrix,  the  number  of  additions 
required  and  multiplications  required  is  of 
order  3CH2  /2;  if  we  do  not  save  if  storage 
spaces  but  only  H,  we  may  work  with  72P^ 
column-by-column,  and  use  binary  scratch 
tapes  for  storage,  with  an  order  of  2CH3 
additions  and  2CH2  multiplications  required. 

Alternatively,  V  might  be  found  in 
connection  with  the  Crout  method  [3]  for 
solving  a  system  of  simultaneous  linear 
equations  with  symmetric  coefficient  matrix. 
Here  we  must  save  H2  storage  spaces.  First, 
an  order  of  CH3  /2  additions  and  multiplications 
are  required  to  find  ^PiJ  then  an  order  of  IP/d 
additions  and  H^6  multiplications  best  done  in 
double  precision,  roughly  equivalent  to  1^/3 
in  single  precision,  are  required.  Thus 
our  method  has  particular  advantage  over 
the  Crout  method  when  (l)  H  is  so  large  that 
storage  of  an  H  x  H  matrix  is  difficult  or 
impossible,  or  (2)  the  ratio  C/H  is  small. 

AN  ALTERNATIVE  METHOD 

In  cases  where  C/H  is  large,  say  ■§■  or 
1  or  greater,  or  when  many  constraints  are 
apt  to  be  redundant  in  terms  of  others,  or 
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for  other  reasons,  it  may  be  that  a  solution 
X*  found  to  be  optimal  subject  to  only  a 
small  subset  of  the  original  constraints 
will  satisfy  all  the  other  constraints  as 
well,  and  thus  be  optima]  subject  to  all 
constraints.  Thus  we  are  led  to  the  following 
approach. 

As  noted  in  Section  2,  it  may  be  that  all 
constraints  are  satisfied  by  a  solution  X  , , 
in  which  case  we  are  done.  If  not,  we  let  Cx 
be  such  that 


g(1/xc1h»  =  "^^(1/-ch). 


let  X?  equal  X  ,  and  let  Ca  be  such  that 
g  (Xf  )  is  mini  mi  zed  (a  negative  value)-  i.e. 
Cs  is  the  constraint  fartherest  from  being 
satisfied-and  minimize  f  subject  to  constraints 
Oj.  and  Qs  .  Then  if  the  resultant  solution  X| 
satisfies  all  other  constraints,  we  are  done; 
otherwise  we  choose  ca  in  terms  of  Xg1  as  we 
did  Cg  in  terms  of  X?  and  proceed  likewise. 
In  this  manner,  we  keep  adding  one  constraint 
at  a  time  and  resolving  the  problem  until  a 
solution  satisfying  all  constraints  is 
obtained. 

For  this  approach  it  is  clear  that  the 
ratio  of  current  constraints  to  H  will  at 
least  at  first  be  very  small,  and  that  the 
Section  3  way  of  finding  the  vector  V  can 
thus  be  used  to  great  advantage. 

A  documented  computer  program  has  been 


written  which  permits  use  of  both  our  basic 
Part  1  method  and  our  Part  3  alternative.  The 
program  also  permits  us  to  solve  a  problem  par- 
tially, then  resume  solution  of  it  in  another 
computer  run,  from  the  point  of  termination  of 
the  previous  run.  Anyone  interested  in  such  a 
program  should  write  to  the  Statistical  Re- 
search Division,  Bureau  of  the  Census. 

EXAMPLE 

We  considered  a  problem  with:  H  =  5  J 
C  =  3;  the  vector  (Nj  ,....,  t^  )  equal  to  1000 

(100,  150,  125,  75,  50);  the  vector  (sj  , , 

Ss )  equal  to  (30,  20,  15,  25,  35);  the  vector 
(i%  y   %  >  u3  )  equal  to  (2,  1,  l) ;  and  the 
3x5  matrix  ((v  .  ))  equal  to 

(20   20   20   20   20  \ 
20    2    3   20    5 
A       17   16    7   12/  . 

The  value  or  rx  was  10000;  the  initial  vector 
Xowas  .001  (.15550,  .11595,  .16560,  .22712, 
.31296).  The  number  of  iterations  correspond- 
ing to  r,  were  9  for  k  =  1,  8  for  k  =  2,  8  for 
k  =  3,  7  for  k  =  4,  8  for  k  =  5,  7  for  k  =  6, 
and  7  for  k  =  7.  For  k  =  8,  the  vector  X  sta- 
bilized right  away,  to  5  decimal  places,  at  .001 
(.22329,  .12491,  .17715,  .27894-,  -32733),  and 
was  unchanged  (to  5  decimal  places)  thereafter- 
although  convergence  according  to  (2.6)  did  not 
occur  (after  14-6)  iterations)  because  the  func- 
tions g  could  not  be  computed  to  the  needed 
accuracy  of  9  or  10  decimal  places  (only 
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to  about  8  places,  as  needed  for  k  =  7) . 
The  corresponding  vector  (nx  ,  . ..,  3%)  — 
which  we  take  as  a  final  solution  to  the 
problem — was  to  1  decimal  place  (1-49.3, 
400.3,  376.3,  H3.4,  87.3);  the  total  cost 
was  24769.4.. 

Slight  modification  of  our  computer 
program  to  deal  with  circumstances  such 
as  the  above,  where  no  convergence  occurred 
for  k  =  8,  is  entirely  possible. 
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NOTES  ON  THE  EXPECTED  VALUE  OF  CERTAIN  HARMONIC  SUMS 


Jack  Ogus 


1 

The  values  of  the  quantity  HR   = , 

1+Sa. 
hp.1    h 

when  ab  =  (0,1 )  are  the  terms  of  the  harmonic 
series,   ,  ~,  •••,  _^.-i»  •••>  r+-|  •      ^e  ah 
are  independent  and  p(afc=l)  =  p  for  all  h,  the 

R 

sum  r  =    Z)  ah   will  have   a  binomial  distribu- 
hBl 

tion,   and  the  expected  value  of  HR   will  be 

1  R       (R+l)jpr+lq(R  +  X)-(r+D 

2_/ 


~p(R+i) 


>(R+1) 


rsO 


(R-r)j(r+l)j 


R+       (R+l)jpr+lq(R+l)-(p+D 
S      — r= — r-T—T -  a"*1 

P+1=0 


(R-r)i(r+l)i 


1   -  q**1 
P(R+1)    * 

From  (1)    it  follows  that 

R+l 

E     £pHR   =  1    -  qR  +  x 

hal 


(1) 


R+l 

1  -    n  q. 

hal 


(2) 


The  form  (2)    suggests  a  generalization  in 
which  the  ah   are  again  independent,   but  the  ph 
are  not  all  equal.      The  form  corresponding  to 
the  left-hand  side  of  (2)   is 


r+i 


R  +  l 


E    £  PhHR   h   -  E    £  rTT 

hal  *  hal 

1     +         £ 

1  =  1 


(3) 


Then  we  have 


R+l  R+l 

E    EPhHR   h  =E    £ 

hal  >  hal 


R+l 


=  E 


R+l 

h=l 
R  +  l 


ah  +      £    a,  £ 


i=i 


hal 


R  +  1       /   1    +  E  a, 

p0(o/o)  +  £p„|   7—  I  U) 


r  =  l 


1    +  £a, 

1  =  1 


The  term  Po(0/0)   does  not  appear  in   (3). 
Therefore,   we  may  define  P0(o/o)    =  0,   and 
then  have 

R+l  R+l  R  +  i 

E    £PhHR   h  =    £Pr  =  1  -P0  =  1  -     nqh.     (5) 

hal  »  pel  hsl 

The  definition  of  Po(0/0)    =  0  may  appear 
arbitrary,   but  the   same  result  may  be  obtained 
as  follows: 


R  +  l 


R+l 


1 


E    £PhH         =    £PhE 

h=l  *  hal 


R+l  R+l 


R+l 

H.     S    a, 

lal 

.A 


R+l 


=  £ph  aMi"  +  FT7    S    57 


1=1       "M 


hal  1=1 

^         1 
1  R+        Pi      Pj 

+ T    —  — 

3. 2 J        ^    q,    q: 

1         R+1  £i£i£i 

+  4*31         ^    a,    q,    qk    "    '" 
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15 


R+l  R+l      p 


h    /  1         _     yi 


=    nqhE-i+-^E- 

h=X         hoi    ^h    \  *•    1=1    ^l 


l^h 


2_        R+1   Pt    Pj_ 


3i 


=  E — H-^--  +  qiEHR~1> 

2  +    £a 
3  =  i 


where  R~i  denotes  all  R  units  except  i. 
Summing   (8)    for  i  =  1,    ....R: 


(8) 


1 


R+i 


Pi     Pj     Pk 


h    ^  ifrfr     ^     ^     ^ 

/  R  +  *    p 


Ril  ph  Pi 


n  q„  h  +  E  —  +  tt  E 

h=i  h  V       h*i  ^      2I  h^  ^  % 


R  +  l 


1  "*        Ph     Pi     Pj 

+  T7     E     —  —  —  +  • 

3-  h^j  qh  qj  qj 


R+l 

-    nqb 

h=l 


R+l  R+l 


R+l 


=   £Pr  -    n  qh  =  1  -    n  q..      (6) 


r=0  h=l 


h=l 


In  addition  to  the  result  (5),  which  is  a 
kind  of  average  of  the  expected  values  of  the 
H       ,   we   can  derive  by  recursion  the  expected 


value  of  a  particular  HR    fc,    say 


H_      „   ,   n       —    H„      — 
R      R  +  1  R 


1  +   E 


(7) 


1  =  1 


REH0    =    £E 
i  =  i 


EE 

1  =  1 


2  +    Ea. 
J  =  i    J 


+    E^EH^ 

IbI 


tt~+  ^qiEHR~i 


1   +  a.    +    E  a. 
J  =  i    J 


i=i 


=  E    £ 


+    EqiEHR 


R  "  ^1        R  ~1 

1=1  „  1=1 

1    +    Ea, 
1  =  1 


=  E 


,1+iPN      "1 


1    +    Ea. 
1  =  1 


+      SqiEHR~l 

1  H  1 


=  1    -  E 


1 


1+    Ea, 
i=i 


+    Eq.EH 


1  =  1 


1  R  ~i 


=  1   -EHR   +    Eq.EH^. 
1  =  1 


Solving  for  EH 


(9) 


We  have 


EHR    =E 


?! 


1  +   E< 
1  =  1 


R~l 

2  +■    Ea 


+  E 


R~l 


J=l 


1   +    Ea, 


(R+1)EHR    =1    +     EqtEH 


1  =  1 


R  ~1 


ehr  ^sk  +  s^s^w 


do) 


A   PROBLEM  IN  EQUALIZING  NUMBERS  OF   DIFFERENT   ITEMS 
AMONG  DIFFERENT  GROUPS 

B.  Causey 


Suppose  that  (to  generalize  the  actual 
experimental  situation  for  which  this  problem 
was  considered)  we  have  J  different  groups  of 
people  scheduled  to  assemble  at  a  given  time 
in  different  places.  Suppose  for  group  j , 
j  =  1,  . ..,  J,  that  the  total  membership  is 
known  to  be  n .  and  that  the  probability  of  at 

least  i  persons  in  the  group  actually  showing 
up,  i  =  1,  . ..,  n.,  is  estimated  to  be  c-.. 

J 

Let  N  =  Z  n . .  We  have  K  different  forms  to 

be  filled;  one  copy  of  one  of  the  K  is  to 
be  handed  out  to  each  of  the  persons  that 
actually  shows  up.  The  problem  of  this  paper 
is  to  pass  out  copies  of  the  K  forms  so  that 
the  numbers  of  persons  that  actually  fill 
form  k,  k  =  1,  ...,K,  are  as  nearly  equal 
as  possible. 

The  actual  experimental  situation  which 
prompted  consideration  of  the  problem  involved 
16  different  versions  of  the  decennial  Census 
form,  copies  of  which  were  passed  out  on  the 
evening  of  May  19,  1971,  at  17  different 
adult  education  classes  at  Woodson  Junior  High 
School,  Washington,  D.  C.  Much  of  the  planned 
analysis  of  the  results  could  conveniently 
be  based  only  on  equal  numbers  of  copies  of 


each  of  the  16  forms,  so  that  we  devised  the 
method  of  this  paper  to  insure  that  the  copies 
actually  filled  out  would  be  as  evenly 
divided  as  possible  among  the  16  forms.  For 
illustrative  purposes  here,  however,  it  is 
convenient  to  use  a  simpler,  hypothetical 
example.   Suppose  that  we  have  2  groups  of  5 
(=]%  )  and  6  (=n2 )  persons  (thus  N  =  11). 
One  way  of  estimating  the  probabilities  c. 


Ji 


is  to  begin  with  an  estimated  average 


attendance  rate  p .  for  group  j  and  then  to 

use  a  binomial  model,  with  parameters  n. 

and  p.;  however,  the  probabilities  c.  need 

not  be  estimated  in  this  fashion.  In  our 

example,  suppose  that  we  have  attendance 

rates  p^  and  pg  equal  to  .8  and  .7  respectively; 

the  probabilities  c . .  for  the  two  groups  are 

thus,  to  three  decimal  places, 

i  =    1     2     3     4     5      6 
J  =  1  1.000  .993  .942  .737  .328 

2   .999  .989  .930  .7AA     -420  .118  . 

Thus,  for  example,  the  probability  that  4  or 

more  persons  will  show  up  in  group  2  is 

estimated  to  be  .74/-- 

We  number  our  K  forms  from  1  to  K, 

these  numbers  being  assigned  at  random;  we 
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then  form  a  stack  of  copies  of  the  forms  In 
which  within  the  1st  K  copies  copy  k  is  of 
form  k,  k=  1,  ...,  K,  the  2nd  K  copies  are 
likewise,  etc.,  until  a  stack  of  N  or  more 
copies  is  obtained.  We  then  take  the  top  N 
copies.  In  our  example  suppose  that  K  =  4j 
the  stack  of  11  copies  will  be  in  order 
1-2-3-4-1-2-3-4-1-2-3.   We  let  the  copies 
from  the  stack  to  be  handed  out  in  group  1 
be  the  1st  x\    copies  from  the  top,  in  group 
2  the  2nd  n3  copies,  etc.   In  our  example 
the  5   copies  for  the  1st  group  and  6  copies 
for  the  2nd  group  will  be  1-2-3-4-1  and 
2-3-4-1-2-3  respectively.  Let  t,  denote 

the  number  of  appearances  of  form  k  in  the 
stack  of  N,  and  r.,  the  number  of  copies 

of  form  k  appearing  among  the  n .  copies  for 

J 

group  j .  For  our  example  we  have  tj  = 
tg  =  t3=  3  and  t4  =  2,  with  rn  =  ras  = 
r23  =  2  and  r12  =  ra3  =  r14  =  r81  =  r34  =1. 
The  problem  may  be  viewed  as  a  series  of 
N  assignments  of  forms  to  priorities,  as 
follows .  For  group  j ,  the  copy  with 
priority  i  is  the  one  which  will  be  handed 
out  to  someone  if  i  or  more  persons  actually 
show  up  and  which  will  not  be  handed  out  if 
fewer  than  i  persons  show  up,  i  =  1,  ..., 
n . .  Let  f ..  be  the  form  number  corresponding 

to  the  copy  with  priority  i  in  group  j . 


There  is  an  (estimated)  probability  c. 

that  this  copy  will  actually  be  filled.   In 
our  example  the  numbers  f . .  turn  out  to  be 

i  =    123456 

3  =  1    21431- 
2    4   3   2   13   2. 

Thus,  for  example,  if  4  persons  actually 

show  up  in  group  2,  copies  4-3-2-1  will  be 

passed  out;  if  3  persons  actually  show  up, 

copies  4-3-2  will  be  passed  out.  Let  v,, 

h  =  1,  ...,  t,  ,  be  the  probability  c. 

associated  with  the  h-th  time  that  form  k 
is  assigned  to  a  particular  priority  (i) 
in  a  particular  group  ( j ) .   Thus  the  sum 

tk 
E,  -   S   v,  ,  is  the  expected  number  of 

times  that  form  k  will  actually  be  filled. 
In  our  example ,  we  obtain  v1  x   ,  vis  ,   and 
v13  equal  to  .328,  .744,  and  .993 
respectively,  with  F^  =  2.065;  we  obtain 
Ea,  Eqi  and  E*  equal  to  2.048,  2.146, 
and  1.941  respectively. 

A  primary  goal  in  assignment  is  that 
variation  among  the  quantities  E,  should  be 

small,  i.e.  the  (expected)  numbers  of  copies 
of  the  different  forms  should  tend  to  be 
equal.  Besides  this  objective,  it  is  very 
important  that  within  each  group  the  expected 
numbers  of  copies  of  the  different  forms 
should  tend  to  be  equal  (so  that  no  one  group 
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will  be  overly  represented  among  the  copies 
of  any  one  particular  form),  and,  simultane- 
ously, the  various  forms  tend  to  be  uniformly 
represented  among  the  highest  and  among  the 
lowest  priorities  (so  that,  if  attendance  is 
much  higher  or  lower  than  expected,  the 
numbers  of  copies  obtained  for  each  form 
should  still  tend  to  be  equal) .   Step  2-A 
below  helps  to  achieve  these  last  two  objec- 
tives. Furthermore,  it  is  desired  that, 
over  all  groups  taken  as  a  whole,  as  for 
each  single  group,  the  various  forms  tend 
to  be  uniformly  represented  among  the  highest 
and  among  the  lowest  priorities,  so  that,  if 
attendance  in  all  groups  is  uniformly  lower 
or  higher  than  expected,  the  numbers  of 
copies  obtained  for  each  form  should  tend 
to  be  equal.  Step  2-B  below  helps  to  achieve 
this  objective. 

The  assignment  plan  used  is  as  follows. 
Initially  we  let  m.  =n.,  d.  =  c,   , 

s .,  =  r.,  ,  and  u,  =  t  .  The  numbers  m. 

J^      J^  K      K  J, 

u,  ,  and  s .,  represent  the  current  counts, 

at  each  stage  of  the  plan,  of  the  numbers 
of  assignments  yet  to  be  made  respectively 
(l)  within  group  j,  (2)  involving  form  k, 
and  (3)  both  within  group  j  and  involving 
form  k.  The  quantity  d.  is  the  probability 
(c.)  corresponding  to  the  next  assignment 
that  will  be  made  within  group  j .  In  each 


of  the  stages  of  the  plan,  in  number  at 
most  N,  we: 

(1)  Pick  j,  subject  to  m.  >  0, 

J 

with  d.  smallest  (break  ties,  unlikely  here, 
by  taking  the  smallest  possible  value  of  j). 
Thus  we  make  the  assignment  corresponding 
to  the  smallest  probability  c . .  for  which 
no  assignment  has  yet  been  made,   (in  our 
example,  this  initially  is,  for  group  2  and 
priority  6,  Cs6   =  .118.) 

(2)  Let  f ...   =  h  according  to  the 
following  criteria: 

(A)  s  .,  is  at  a  maximum  for  k  =  h: 

jk 

(B)  subject  to  (A)  u,  is  at  a 
maximum  for  k  =  h; 

(C)  subject  to  (A)  and  (B)  the 
sum  w,  of  probabilities  already  associated 
with  form  k  is  at  a  maximum  for  k  =  h. 

If  t,  =  ix  ,  then  w,  =  0;  otherwise,  w,  = 

tk"uk 
Z     v.  . .   If  there  is  a  tie  for  the  maximum 
i=i   kl 

of  w, — unlikely  unless  this  maximum  is  0 — 

we  may  break  the  tie  by  taking  h  equal  to 
the  smallest  tie  value  of  k. 

Corresponding  to  the  group  (j)  and 
priority  (i)  determined  in  step  1,  we  thus 
assign  a  copy  of  form  h  where:  (A)  the 
number  of  copies  yet  to  be  assigned  within 
group  j  of  form  k  is  at  a  maximum  for  k  =  h 
(in  our  example,  within  group  2  there  are 
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initially  2  remaining  copies  of  forms  2  and 
3  and  1  copy  of  forms  1  and  4-j  so  that  h 
must  equal  2  or  3) >  (B)  subject  to  step  2-A 
the  number  of  copies  of  form  k  remaining  in 
all  groups  is  maximized  for  k  =  h  (for  k  =  2 
and  k  =  3  this  number  is  5,  so  that  2  and 
3  are  still  "tied"  as  candidates  for  h) j 
and  (c)  subject  to  steps  2-A  and  2-B 
the  sum  of  probabilities  c . .  corresponding 

to  assignments  already  made  for  form  k  is 
maximized  for  k  =  h  (this  sum  is  0  initially 
for  all  k,  so  that  we  break  the  tie  between 

2  and  3  by  letting  h  =  2). 

(3)  Add  c,   to  v,  :  then  reduce 
J 
m. ,  s„  ,  and  u,  by  1.  Here  we  are  keeping 

our  tallies  current,   (in  our  example,  w2  , 
Efe ,  S22 ,  and  Ug  initially  were  0,  6,  2,  and 

3  respectively;  now  they  are  .118,  5, 
1,  and  2.) 

U)  If  m.  >  1  let  d.  =  c.,   ; 

if  m.  =  1  make  the  last  assignment  (now  known) 

J 

for  group  j,  and  do  as  in  (3).   (in  our 
example,  m.  =  5  >  1,  as  that  we  just  let  dz~ 

J 

Cj,5  =  .420.) 

(5)  Go  back  to  step  1  unless  all 
quantities  m.  are  zero,  in  which  case  we 
are  done. 

In  the  second  stage  of  our  example,  in 
step  1  we  choose  j  =  1,  corresponding  to 


i%  =5,  and  c15    -   .328.   In  step  2-A  we 
find  that  h  =  1,  since  within  group  1,  2 
assignments  remain  for  form  1  and  1  assign- 
ment remains  for  each  of  the  other  3  forms. 
Continuing  according  to  the  above  steps,  the 
assignments  are  completed  in  9  stages  (with 
the  help  of  step  U) • 

Comments  on  this  assignment  plan  are  in 
order,  as  follows: 

Step  1  provides  for  assignment  of  forms 
to  priorities  in  order  from  least  likely  to 
be  handed  out  to  most  likely.  The  restric- 
tion m.  >  0  keeps  us  from  trying  to  make 

J 


assignments  for  a  class  where  all  n 


priorities  have  already  been  filled. 

Step  2-A  helps  provide  that  within  a 
given  group  the  different  forms  will  tend 
to  be  uniformly  distributed  over  priorities : 
if  we  omit  (2-A)  ,  it  might  be  that  a 
particular  form  will  appear  unduly  often 
in  the  high  priorities  of  a  particular  group. 
It  will  be  noted  that  the  values  s  „  for 
fixed  j  differ  initially  among  themselves 
by  at  most  1. 

Step  2-B  helps  provide,  subject  to  (a), 
that  each  assignment  will  involve  a  form 
for  which  a  maximum  number  of  assignments 
(u,  )  remain  to  be  made,  and  thus  helps  to 
make  uniform  for  all  forms  the  spread  of 
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assignments  across  the  distribution  of 
probabilities  c,  from  smallest  to  largest. 

Step  2-C  picks,  subject  to  (a)  and  (B), 
the  form  k  for  which  w,  is  largest.  Since 
the  current  assignment  involves  the  proba- 
bility c . .  smallest  of  all  those  remaining 
to  be  assigned,  this  assignment  should  involve 
the  form  with  the  largest  sum  w,  of  proba- 
bilities already  associated  with  it  thus  far — 
the  number  of  future  assignments  for  each 
form  under  consideration  being  constant 
because  of  (2-B). 

In  the  actual  experiment  there  were  17 
groups  with  binomial  parameters  n .  and  p . 

J  J 

as  follows : 

j  123456789 
n.  37  23  11  15  20  13  14  8  24 
p.    .65   .65     .3   .85     .8   .55     -3     -5   .65 

J 

j  10  11  12  13  H  15  16  17 
n.  16  16  11  13  43  10  17  23 
p.  .5  .7  .8  .3  .3  .3  .6  .6 

Groups  3-4  and  7-8  were  combined,  with  the 

combined  groups  assumed  to  have  distributions 

based  on  the  sum  of  two  binomial  variates. 

Thus  we  had  J  =  15.  For  K  =  3  we  obtained 

to  three  decimal  places 


k=12345678 
Ek  =  21.+  .827  .827  .818  .824  .827  .826  .825  .827; 

for  K  =  16  we  obtained  to  three  decimal  places 
k  =     1     2     3     4     5 

.927  .935  .950  .933  .948 


\=10  + 
k  = 


EL  =  10'  + 
k 

k  = 


E,  =  10  + 
k 


6  7  8  9  10 

.948  .945  .924  .744  -947 

11  12  13  H  15    16 

.903  .907  .916  .892  .888  .899. 


The  results  for  K  =  16  were  actually  used. 
The  relatively  small  variation  among  the 
quantities  E,  suggests  that  the  method 
works  well  for  given  probabilities  c . . 

(and  possibly  accompanying  p.),  and  that  the 

chief  problems  are  in  estimating  c . .  and 

p .  accurately.  In  our  experiment  the 
estimates  of  these  quantities  turned  out 
to  be  far  too  high.  Hwoever,  we  obtained 
from  102  copies  actually  filled  at  least 
five  of  each  of  the  16  forms  (at  least 
six  would  have  been  too  much  to  hope  for), 
so  that  80  of  102  copies  could  be  used 
in  the  analysis. 
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